Introduction

Mental illness is one of the leading causes of disability in the world. In a report evaluating US Health from 1990-2010, the disease burden of mental illness is among the highest of all diseases [1]. Disease burden refers to the impact of a health problem measured by financial cost, mortality, morbidity, and other factors. Most patients with serious mental diseases or disorders spend years struggling without the ability to live a normal life. It is a big burden both for the patients, the family and our society.

New York City is facing the same problem with high rate of mental illness. Different studies have indicated possible relationship between urban life and higher risk of mental illness. In 2015, New York City launched an action plan called “thriveNYC”[2], aiming to change the way people think about mental health and provide more accessible services citywide.

The goals of my project here are:

  1. To gain insights of the mental health situation in NYC;

  2. To discover potentially effective interventions and provide guidance for the distribution of fundings and services of NYC to finally improve new yorkers’ mental health.

The data I am using is from Health Data NY, Statewide Planning and Research Cooperative System (SPARCS)[3]. The raw data includes the NY hospitalization inpatient discharges for all diseases from 2009-2014.

This file has 3 sections.

Section 1: Explore the 2014 dataset and discovery interesting patterns.

Section 2: Explore the combined dataset from 2009-2014; Analyze the econimic burdens caused by mental diseases in NYC.

Section 3: Three Final plots.

References:
[1] US Burden of Disease Collaborators. The state of US health, 1990-2010: burden of diseases, injuries, and risk factors. JAMA, 310(6): 591-608, 2013.
[2] https://thrivenyc.cityofnewyork.us/
[3] https://health.data.ny.gov/Health/Hospital-Inpatient-Discharges-SPARCS-De-Identified/mpue-vn67
——————————————

Section 1: Explore the 2014 dataset and discovery interesting patterns.

1.1 Dataset Exploration

  • The 6 datasets are organized by year in similar format, I am going to explore 2014 raw data first.
  • Load the dataset of patients information with all disease hospitalization in NY 2014.
df14 <- 
    read.csv("./Data/Hospital_Inpatient_Discharges__SPARCS_De-Identified___2014.csv")
  • Subset data to find those with Mental Diseases & Disorders in NYC
    1. Subset data with patients from NYC
    2. Subset data with patients of mental diseases and disorders

    Based on the diagnosis of mental diseases and disorders, DRG code is used to extract mental diseases and disorders hospilization in the dataset. Neurological diseases and Drug & Alcohol abuses are excluded from the dataset.

    Details of DRG code used are listed here: https://github.com/super-penguin/SPARCS-health-data.

# Subset the Data
# Subset the inpatient hospilization for Mental Diseases & Disorders of NYC in 2014
# The subset is based on DRG code
Mental.Code<- c(740, 750, 751, 752, 753, 754, 755, 756, 758, 759, 760, 561, 766)
County <- c("Bronx","Kings", "Manhattan", "Queens", "Richmond")
df14.NYC <- subset(df14, Hospital.County %in% County)
df14.mental<- subset(df14.NYC, APR.DRG.Code %in% Mental.Code)

1.2 Compare the top10 diseases in NYC 2014

1.2.1 Plot top 10 diseases with highest hospitalization in NYC

  • Remove the hospitalization data of Newborn, Naginal delivery and Cesarean delivery.
    • Those 3 are not caused by diseases.
# Convert the cost and charge ($) into integer for further exploration
df14.NYC$Total.Charges<- destring(df14.NYC$Total.Charges)
df14.NYC$Total.Costs<- destring(df14.NYC$Total.Costs)
df14.mental$Total.Charges<- destring(df14.mental$Total.Charges)
df14.mental$Total.Costs<- destring(df14.mental$Total.Costs)

# Group dataset by DRG code and sum patients number for each disease 
df14.NYC.DRG.Group<- df14.NYC %>%
    group_by(APR.DRG.Code, APR.DRG.Description) %>%
    summarise(Total.Patients.Number = n()) %>%
    arrange(Total.Patients.Number)
  • Bar Plot of the Top 10 Diseases in NYC 2014

In this figure, Schizophrenia and Bipolar Disorders are both belong to mental disorders. Among the top 10 diseases, two of them are mental disorders and Schizophrenia is the third most common one. It indicates the importance of understanding mental health situation in NYC.

1.2.2 Plot the Number of Patients with Different Mental Diseases & Disorders by DRG Code

Schizophrenia, Bipolar Disorders and Major Depressive Disorders are the TOP 3 most common mental illnesses in NYC 2014.

1.2.3 Plot the Fraction of Patients Admitted Through Emergency Department for the Top 10 Diseases in NYC 2014

By comparing the emergent addmission rate of top 10 diseases in NYC, Schizophrenia and Bipolar Disorders are not the highest, but they all lie in the higher range (around 70%).

Emergency admission rate of mental diseases & disorders implies the importance of early action on the road to improve mental health. Improving early counseling services and early responding team might be an effective way to provide patients with necessary help and prevent it from getting worse.

1.2.4 Compare and Plot the Total Charges of the Top 10 Diseases in NYC 2014

The averaged total charge of Schizophrenia is the third highest among the top 10 diseases. It is a huge financial burden both to the patients’ family and our city.

1.2.5 Comapre the total charges and costs (log scale) distribution of all the diseases vs. mental diseases in NYC

There is not much difference between the distribution of total changes and costs in all other disceases compared with mental diseases.

1.2.6 Comapre the hospitalization stay length distribution of all the diseases vs. mental diseases in NYC

Compared with the distribution of all disease in NYC, the hospitalization length for mental diseases distribute more toward longer stay. The peak distribution is smimilar with other diseases, but more cases for mental diseases are toward longer period.

1.2.7 Compare and Plot the Length of Hospitalization of the Top 10 Diseases in NYC 2014

The hospitalization length of Schizophrenia and Biopolar Disorders are both in the higher range. Actually, this figure might not represent the actual long term burden of mental diseases. In fact, most patients still need extra care at home or specific facilities after discharging from the hospital.

1.2.8 Observations and Reflections

In this chapter, the severity of mental disorders are compared with the top 10 diseases in NYC on different aspects. In 2014, schizophrenia alone was already the thrid leading cause of patient hospitalization in NYC. Besides the shocking number of patients with mental problems, the high charges and long hospitalization duration are heavy burden both to the patients and our city. Most patients with severe mental disorders lose the ability to work and live by themselves for years or even a lifetime. Extra care and cost is needed constantly.

From those figures, it is clear that mental health is one of the urgent problems to our city and effective data sharing should be coordinated to come up with new strategies. We are going to focus on exploring hospitalization data of mental diseases in NYC 2014 for the next part.

1.3. Discover the vulnerable groups in NYC who are more likely to suffer from mental problems

1.3.1 Bar Plot of Patients with different age and racial groups

  • Group patients data by age, gender and race
df14.fc_by_age_race <- df14.mental %>%
    filter(Race != "Multi-racial") %>%
    group_by(Age.Group, Race, Gender) %>%
    summarise(mean_days = mean(as.numeric(Length.of.Stay)),
              mean_costs = mean(as.numeric(Total.Costs)),
              n = n()) %>%
    arrange(Age.Group)
  • Data Visualization - The difference of patient number by age and racial groups

  • Data Visualization - The difference of patients number by age and gender groups

There are more patients with mental disease from age 18-69. This trend corresponds to the population age distribution. For the gender difference, male are more likely to suffer from mental diseases.

There seems to have a significant racial difference in patients with mental diseases. Black/African seems more likely to suffer from mental diseases. However, no conclusion can be drawn without normalizing the patient number to different racial population.

1.3.2 Normalizing Patients Number with the Corresponding Racial Population

  • The Estimated Population of NYC in 2014:
    • Total Population: 8405837
    • White: 33% - 2773926
    • Black/African American: 23% - 1933342
    • Other Race: 44% - 3698568

Conclusion

There is a significant racial difference for mental diseases and disorders hospitalization in NYC. The percentage of Black/African American with mental problems is almost two times compared with other races. Statistical analysis will be performed for all the data from 2009-2014 on racial difference in Section 2.

1.3.3. Other bivariate and multivariate plots to explore the data set.

  • Group patients data by age and mental diseases type
df14.fc_by_age_disease <- df14.mental %>%
    group_by(Age.Group, APR.DRG.Description) %>%
    summarise(mean_days = mean(as.numeric(Length.of.Stay)),
              mean_costs = mean(as.numeric(Total.Costs)),
              sum_costs = sum(as.numeric(Total.Costs)),
              n = n()) %>%
    arrange(Age.Group)
  • Data Visualization - The difference of patient number by disease type and age groups

  • Explore the economic burden by different mental diseases and age groups - Heat Map

  • Group patients data by race and mental diseases type
df14.fc_by_race_disease <- df14.mental %>%
    group_by(Race, APR.DRG.Description) %>%
    filter(Race != "Multi-racial") %>%
    summarise(mean_days = mean(as.numeric(Length.of.Stay)),
              mean_costs = mean(as.numeric(Total.Costs)),
              sum_costs = sum(as.numeric(Total.Costs)),
              n = n()) %>%
    arrange(Race)
  • Data Visualization - The difference of patients number by race and mental disease type

  • Explore the economic burden by different mental diseases and races - Heat Map

  • Explore the economic burden by different age and racial groups - Heat Map

All these explorations reveal the high economic burden of Schizophrenia in Black/African American to NYC in 2014. We are going to explore the econimic burdern by geographic region of NYC in Section 2.

1.4 Explore the temperal patterns of mental health hospitalization in NYC

  • Compare the day of hospitalization admission for mental diseases and disorders in NYC

The plot of hospitalization admission date has an interesting pattern. The number of patients admitted are much higher during weekday compared with weekend. The trend goes up from Monday to Wednesday and down from Wednesday to Friday. Then it drops significantly on Saturday and keeps going lower on Sunday. This Interesting trend matches the working pressure during our daily life. It indicates that mental illness is highly likely to be triggered by work and study pressure in NYC.

Final Conclusion

Racial and Gender differences are two important factors in 2014 dataset. After the exporalization, we are going to process and analyze all the datasets from 2009 to 2014. In the next section, we are going to focus on three factors: race, gender and geographic region of NYC.


Section 2: Explore NYC mental diesease inpatient data from 2009-2014

2.1. ANOVA - to assess the importance of Race, Gender and Hospital County factors on NYC mental diseases and its econimic burdens.

##                               Df    Sum Sq Mean Sq F value   Pr(>F)    
## Race                           2   2933531 1466765   3.120   0.0445 *  
## Hospital.County                4  13639464 3409866   7.254 9.08e-06 ***
## Gender                         2   1547778  773889   1.646   0.1932    
## Race:Hospital.County           8   6916519  864565   1.839   0.0661 .  
## Race:Gender                    2    220224  110112   0.234   0.7912    
## Hospital.County:Gender         4    828332  207083   0.441   0.7793    
## Race:Hospital.County:Gender    8    390637   48830   0.104   0.9991    
## Residuals                   1138 534911505  470045                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion: There are statistically significant differences between Race and Hospital.County in our dataset. Next, we are going to plot them seperately and perfrom post-hoc analysis.

2.2 Plot the relationship of Race, Age Groups and Patient Numbers

2.2.1 Plot without normalizing to each racial population

2.2.2 Plot with normalizing to each racial population

2.2.3 Post-hoc analysis of the factor: Race

##               Df    Sum Sq Mean Sq F value Pr(>F)  
## Race           2   2933531 1466765   3.062 0.0471 *
## Residuals   1166 558454460  478949                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Patients.Number ~ Race, data = NYC.mental.new)
## 
## $Race
##                                         diff       lwr      upr     p adj
## Other Race-Black/African American  -74.82015 -192.5655 42.92520 0.2954996
## White-Black/African American      -121.43272 -237.2946 -5.57088 0.0373768
## White-Other Race                   -46.61257 -162.2383 69.01317 0.6112210

2.2.4 Plot without normalizing patients number to each racial population at county level

2.2.5 Plot with normalizing patients number to each racial population at county level

2.2.6 Post-hoc analysis of the factor: Hosptical County

##                 Df    Sum Sq  Mean Sq F value Pr(>F)    
## Hospital.County  4 254240272 63560068   38.79 <2e-16 ***
## Residuals       85 139263381  1638393                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Total.Patients ~ Hospital.County, data = NYC.fc_by_race_county)
## 
## $Hospital.County
##                          diff        lwr        upr     p adj
## Kings-Bronx         2298.7222  1109.5199  3487.9246 0.0000061
## Manhattan-Bronx     3101.0000  1911.7977  4290.2023 0.0000000
## Queens-Bronx         647.5000  -541.7023  1836.7023 0.5540743
## Richmond-Bronx     -1640.0000 -2829.2023  -450.7977 0.0021225
## Manhattan-Kings      802.2778  -386.9246  1991.4801 0.3357701
## Queens-Kings       -1651.2222 -2840.4246  -462.0199 0.0019418
## Richmond-Kings     -3938.7222 -5127.9246 -2749.5199 0.0000000
## Queens-Manhattan   -2453.5000 -3642.7023 -1264.2977 0.0000013
## Richmond-Manhattan -4741.0000 -5930.2023 -3551.7977 0.0000000
## Richmond-Queens    -2287.5000 -3476.7023 -1098.2977 0.0000068

Conclusion: There are significant racial and county differences in the utilization of inpatient mental health service in NYC.

2.2.7 Map NYC mental diseases and disorders hospitalization into County Level

## OGR data source with driver: ESRI Shapefile 
## Source: "nybb_16c", layer: "nybb"
## with 5 features
## It has 4 fields
  • MAP1 - Map Patients Number (per 1,000 Population) with Mental Problems
  • MAP2 - Map The Cumulative Total Costs for Mental Health Treatment at County Level
  • MAP3 - Map The Averaged Charges for Mental Health Treatment at County Level

Section 3: Final Plots

1. Final plot 1 - The Racial Difference in Mental Illness of NYC from 2009 - 2014

There is a significant racial difference for mental diseases and disorders hospitalization in NYC.

From 2009 to 2014, Black/African Americans have the highest mental disorder hospitalizations in almost every age group compared with other racial groups. The ANOVA in section 2 also showed that Black/African have significant higher rate of mental disease problems compared with white (p=0.0373768*). This final figure 1 showed the percent differences of patients with mental diseases in each ratial group. This figure excluded the influence of different racial populations in NYC and presented the shocking differences directly. In summary, Black/African American has 0.6% higher population rate with mental diseases in NYC compared with White.

I researched for potential reasons. One possible explanation is genetic difference. However, I didn’t find much evidence to support this assumption. Another possible reason is the bias in mental disorder diagnosis, which means one race is more likely to be diagnosed with severe mental disorders. There are some studies showing that a Black/African American is more likely to be diagnosed as schizophrenia with the same symptom when a White American is diagnosed as depression. However, this observation does not explain my results since I grouped all those possible mental diseases together. I will explore the data further to see if I could come up with a reasonable explanation for racial difference.

Final plot 2 - the Day of Hospitalization Admission for Mental Illness in NYC 2014

In Final plot 2, the number of patients admitted are much higher during weekday compared with weekend. The trend goes up from Monday to Wednesday and down from Wednesday to Friday. Then it drops significantly on Saturday and keeps going lower on Sunday. This Interesting trend matches the working pressure during our daily life. It indicates that mental illness is highly likely to be triggered by work and study pressure in NYC.

Final plot 3 - Map The Cumulative Total Costs for Mental Health Treatment at County Level

In the five counties of new york city, Manhattan does not have the largest population, however it has the largest patients number and highest econimic burderns with mental problems compared with other counties. When normalized with the county population, the difference gets more bigger. Manhattan has much more patients with mental diseases and disorders per 10,000 population compared with other counties in NYC.

This result is not surprising. The condensed population and high living pressure in manhattan might be the leading cause for this difference. Based on this observation, more fundings and services should be distributed in manhattan to improve mental health of new yorkers.

In addition, since the data sets donnot have complete patient zip code information, I used the hospital county information to analyze the economic burdens on county level. I wonder would this be caused by the density and compacity of mental hospitals in Manhattan compared with other counties.

The gender difference is another interesting observation. I was debating if I should include Maternal Depression into the total mental health data, since it might cause gender bias in the final results. However, even if I included Maternal Depression, male adults still have much higher hospilization rate with mental problems. It also indicates that work and family pressure might be one of the leading cause to induce mental problems in NYC.


Final Reflections

This dataset is limited in many ways. First, patients hospilization infomation in this dataset does not account for the readmission. Patients with mental diseases and disorders have a high readmission rate, but when I am analyzing this dataset, the readmission rate is missing. Bias might be induced by this missing factor and interesting observation might be ignored without the consideration of readmission.

In addition, due to the confidential problem, the zip code information for patients are not complete (a lot missing values). For these data points that have patients’ zip code information, they can only be showed for the first 3 digits, which makes it impossible to map the mental health profile into community level. New york city is a large and ethnically diverse metropolis. Analysis on county level does not provide enough information refecting the health situation when considering the diverse demographic characteristics of each community. I will continue this project with more detailed data and hopefully to map a better NYC mental health profile.

Future work: anlayze the mental hospital location and capacity in NYC at county level. Hopefully to get a better idea about the funding distributions in NYC for mental health problems.